Dataset statistics
| Number of variables | 14 |
|---|---|
| Number of observations | 891221 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 87962 |
| Duplicate rows (%) | 9.9% |
| Total size in memory | 95.2 MiB |
| Average record size in memory | 112.0 B |
Variable types
| Numeric | 14 |
|---|
| Dataset has 87962 (9.9%) duplicate rows | Duplicates |
SEMIO_DOM is highly correlated with SEMIO_KAEM and 4 other fields | High correlation |
SEMIO_ERL is highly correlated with SEMIO_FAM and 2 other fields | High correlation |
SEMIO_FAM is highly correlated with SEMIO_ERL and 3 other fields | High correlation |
SEMIO_KAEM is highly correlated with SEMIO_DOM and 5 other fields | High correlation |
SEMIO_KRIT is highly correlated with SEMIO_DOM and 4 other fields | High correlation |
SEMIO_KULT is highly correlated with SEMIO_DOM and 7 other fields | High correlation |
SEMIO_MAT is highly correlated with SEMIO_REL | High correlation |
SEMIO_PFLICHT is highly correlated with SEMIO_RAT and 2 other fields | High correlation |
SEMIO_RAT is highly correlated with SEMIO_PFLICHT and 1 other fields | High correlation |
SEMIO_REL is highly correlated with SEMIO_ERL and 5 other fields | High correlation |
SEMIO_SOZ is highly correlated with SEMIO_DOM and 4 other fields | High correlation |
SEMIO_TRADV is highly correlated with SEMIO_PFLICHT and 2 other fields | High correlation |
SEMIO_VERT is highly correlated with SEMIO_DOM and 4 other fields | High correlation |
SEMIO_DOM is highly correlated with SEMIO_KAEM and 4 other fields | High correlation |
SEMIO_ERL is highly correlated with SEMIO_FAM and 2 other fields | High correlation |
SEMIO_FAM is highly correlated with SEMIO_ERL and 3 other fields | High correlation |
SEMIO_KAEM is highly correlated with SEMIO_DOM and 5 other fields | High correlation |
SEMIO_KRIT is highly correlated with SEMIO_DOM and 4 other fields | High correlation |
SEMIO_KULT is highly correlated with SEMIO_DOM and 7 other fields | High correlation |
SEMIO_MAT is highly correlated with SEMIO_REL | High correlation |
SEMIO_PFLICHT is highly correlated with SEMIO_RAT and 2 other fields | High correlation |
SEMIO_RAT is highly correlated with SEMIO_PFLICHT and 1 other fields | High correlation |
SEMIO_REL is highly correlated with SEMIO_ERL and 5 other fields | High correlation |
SEMIO_SOZ is highly correlated with SEMIO_DOM and 4 other fields | High correlation |
SEMIO_TRADV is highly correlated with SEMIO_PFLICHT and 2 other fields | High correlation |
SEMIO_VERT is highly correlated with SEMIO_DOM and 4 other fields | High correlation |
SEMIO_DOM is highly correlated with SEMIO_KAEM and 1 other fields | High correlation |
SEMIO_ERL is highly correlated with SEMIO_FAM and 1 other fields | High correlation |
SEMIO_FAM is highly correlated with SEMIO_ERL and 2 other fields | High correlation |
SEMIO_KAEM is highly correlated with SEMIO_DOM and 3 other fields | High correlation |
SEMIO_KRIT is highly correlated with SEMIO_KAEM and 1 other fields | High correlation |
SEMIO_KULT is highly correlated with SEMIO_FAM and 1 other fields | High correlation |
SEMIO_PFLICHT is highly correlated with SEMIO_RAT and 1 other fields | High correlation |
SEMIO_RAT is highly correlated with SEMIO_PFLICHT | High correlation |
SEMIO_REL is highly correlated with SEMIO_ERL and 2 other fields | High correlation |
SEMIO_SOZ is highly correlated with SEMIO_VERT | High correlation |
SEMIO_VERT is highly correlated with SEMIO_DOM and 3 other fields | High correlation |
SEMIO_KULT is highly correlated with SEMIO_FAM and 12 other fields | High correlation |
SEMIO_FAM is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_PFLICHT is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_MAT is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_SOZ is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_KAEM is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_KRIT is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_TRADV is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_DOM is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_RAT is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_REL is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_LUST is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_ERL is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
SEMIO_VERT is highly correlated with SEMIO_KULT and 12 other fields | High correlation |
Reproduction
| Analysis started | 2021-05-17 14:08:13.455196 |
|---|---|
| Analysis finished | 2021-05-17 14:10:05.722150 |
| Duration | 1 minute and 52.27 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.667550473 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.79571208 |
|---|---|
| Coefficient of variation (CV) | 0.3847225843 |
| Kurtosis | -0.9093931002 |
| Mean | 4.667550473 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.4131770759 |
| Sum | 4159819 |
| Variance | 3.224581876 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 6 | 183435 | |
| 5 | 177889 | |
| 7 | 161495 | |
| 4 | 125115 | |
| 2 | 101498 | |
| 3 | 97027 | |
| 1 | 44762 | 5.0% |
| Value | Count | Frequency (%) |
| 1 | 44762 | 5.0% |
| 2 | 101498 | |
| 3 | 97027 | |
| 4 | 125115 | |
| 5 | 177889 | |
| 6 | 183435 | |
| 7 | 161495 |
| Value | Count | Frequency (%) |
| 7 | 161495 | |
| 6 | 183435 | |
| 5 | 177889 | |
| 4 | 125115 | |
| 3 | 97027 | |
| 2 | 101498 | |
| 1 | 44762 | 5.0% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.481404725 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.807551899 |
|---|---|
| Coefficient of variation (CV) | 0.4033449354 |
| Kurtosis | -1.105118991 |
| Mean | 4.481404725 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.04328868758 |
| Sum | 3993922 |
| Variance | 3.267243868 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 4 | 196206 | |
| 3 | 180824 | |
| 7 | 179141 | |
| 6 | 139209 | |
| 2 | 77012 | 8.6% |
| 5 | 76133 | 8.5% |
| 1 | 42696 | 4.8% |
| Value | Count | Frequency (%) |
| 1 | 42696 | 4.8% |
| 2 | 77012 | 8.6% |
| 3 | 180824 | |
| 4 | 196206 | |
| 5 | 76133 | 8.5% |
| 6 | 139209 | |
| 7 | 179141 |
| Value | Count | Frequency (%) |
| 7 | 179141 | |
| 6 | 139209 | |
| 5 | 76133 | 8.5% |
| 4 | 196206 | |
| 3 | 180824 | |
| 2 | 77012 | 8.6% |
| 1 | 42696 | 4.8% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.272729211 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.915885028 |
|---|---|
| Coefficient of variation (CV) | 0.4483984202 |
| Kurtosis | -1.19892824 |
| Mean | 4.272729211 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.2058485462 |
| Sum | 3807946 |
| Variance | 3.67061544 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 6 | 186729 | |
| 2 | 139562 | |
| 4 | 135942 | |
| 5 | 133740 | |
| 7 | 118517 | |
| 3 | 94815 | |
| 1 | 81916 |
| Value | Count | Frequency (%) |
| 1 | 81916 | |
| 2 | 139562 | |
| 3 | 94815 | |
| 4 | 135942 | |
| 5 | 133740 | |
| 6 | 186729 | |
| 7 | 118517 |
| Value | Count | Frequency (%) |
| 7 | 118517 | |
| 6 | 186729 | |
| 5 | 133740 | |
| 4 | 135942 | |
| 3 | 94815 | |
| 2 | 139562 | |
| 1 | 81916 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.445007467 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.852412242 |
|---|---|
| Coefficient of variation (CV) | 0.4167399618 |
| Kurtosis | -1.236079929 |
| Mean | 4.445007467 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.1927373604 |
| Sum | 3961484 |
| Variance | 3.431431115 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 6 | 206001 | |
| 3 | 180955 | |
| 7 | 135579 | |
| 5 | 128501 | |
| 2 | 114038 | |
| 4 | 78944 | 8.9% |
| 1 | 47203 | 5.3% |
| Value | Count | Frequency (%) |
| 1 | 47203 | 5.3% |
| 2 | 114038 | |
| 3 | 180955 | |
| 4 | 78944 | 8.9% |
| 5 | 128501 | |
| 6 | 206001 | |
| 7 | 135579 |
| Value | Count | Frequency (%) |
| 7 | 135579 | |
| 6 | 206001 | |
| 5 | 128501 | |
| 4 | 78944 | 8.9% |
| 3 | 180955 | |
| 2 | 114038 | |
| 1 | 47203 | 5.3% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.76322259 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.830789358 |
|---|---|
| Coefficient of variation (CV) | 0.3843593961 |
| Kurtosis | -0.8752505716 |
| Mean | 4.76322259 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.3882238006 |
| Sum | 4245084 |
| Variance | 3.351789675 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 7 | 219847 | |
| 5 | 156298 | |
| 4 | 144079 | |
| 6 | 133049 | |
| 3 | 129106 | |
| 1 | 54947 | 6.2% |
| 2 | 53895 | 6.0% |
| Value | Count | Frequency (%) |
| 1 | 54947 | 6.2% |
| 2 | 53895 | 6.0% |
| 3 | 129106 | |
| 4 | 144079 | |
| 5 | 156298 | |
| 6 | 133049 | |
| 7 | 219847 |
| Value | Count | Frequency (%) |
| 7 | 219847 | |
| 6 | 133049 | |
| 5 | 156298 | |
| 4 | 144079 | |
| 3 | 129106 | |
| 2 | 53895 | 6.0% |
| 1 | 54947 | 6.2% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.025013998 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.90381623 |
|---|---|
| Coefficient of variation (CV) | 0.4729961763 |
| Kurtosis | -1.05018586 |
| Mean | 4.025013998 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.03536077331 |
| Sum | 3587177 |
| Variance | 3.624516239 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 3 | 209067 | |
| 5 | 176282 | |
| 1 | 128216 | |
| 7 | 117378 | |
| 4 | 101502 | |
| 6 | 101286 | |
| 2 | 57490 | 6.5% |
| Value | Count | Frequency (%) |
| 1 | 128216 | |
| 2 | 57490 | 6.5% |
| 3 | 209067 | |
| 4 | 101502 | |
| 5 | 176282 | |
| 6 | 101286 | |
| 7 | 117378 |
| Value | Count | Frequency (%) |
| 7 | 117378 | |
| 6 | 101286 | |
| 5 | 176282 | |
| 4 | 101502 | |
| 3 | 209067 | |
| 2 | 57490 | 6.5% |
| 1 | 128216 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.359086018 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.022829383 |
|---|---|
| Coefficient of variation (CV) | 0.4640489714 |
| Kurtosis | -1.207111475 |
| Mean | 4.359086018 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.3030834485 |
| Sum | 3884909 |
| Variance | 4.091838712 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 5 | 170040 | |
| 6 | 158624 | |
| 7 | 158234 | |
| 2 | 114373 | |
| 1 | 110382 | |
| 4 | 97495 | |
| 3 | 82073 |
| Value | Count | Frequency (%) |
| 1 | 110382 | |
| 2 | 114373 | |
| 3 | 82073 | |
| 4 | 97495 | |
| 5 | 170040 | |
| 6 | 158624 | |
| 7 | 158234 |
| Value | Count | Frequency (%) |
| 7 | 158234 | |
| 6 | 158624 | |
| 5 | 170040 | |
| 4 | 97495 | |
| 3 | 82073 | |
| 2 | 114373 | |
| 1 | 110382 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.001596686 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.857540107 |
|---|---|
| Coefficient of variation (CV) | 0.4641997315 |
| Kurtosis | -1.036445759 |
| Mean | 4.001596686 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.01186754272 |
| Sum | 3566307 |
| Variance | 3.45045525 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 5 | 171267 | |
| 4 | 162862 | |
| 2 | 134549 | |
| 3 | 123701 | |
| 7 | 111976 | |
| 1 | 97341 | |
| 6 | 89525 |
| Value | Count | Frequency (%) |
| 1 | 97341 | |
| 2 | 134549 | |
| 3 | 123701 | |
| 4 | 162862 | |
| 5 | 171267 | |
| 6 | 89525 | |
| 7 | 111976 |
| Value | Count | Frequency (%) |
| 7 | 111976 | |
| 6 | 89525 | |
| 5 | 171267 | |
| 4 | 162862 | |
| 3 | 123701 | |
| 2 | 134549 | |
| 1 | 97341 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.256075654 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.770136694 |
|---|---|
| Coefficient of variation (CV) | 0.4159081836 |
| Kurtosis | -0.8653655223 |
| Mean | 4.256075654 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.1694070752 |
| Sum | 3793104 |
| Variance | 3.133383917 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 5 | 203845 | |
| 4 | 162117 | |
| 3 | 133990 | |
| 7 | 115458 | |
| 6 | 109442 | |
| 2 | 92214 | |
| 1 | 74155 | 8.3% |
| Value | Count | Frequency (%) |
| 1 | 74155 | 8.3% |
| 2 | 92214 | |
| 3 | 133990 | |
| 4 | 162117 | |
| 5 | 203845 | |
| 6 | 109442 | |
| 7 | 115458 |
| Value | Count | Frequency (%) |
| 7 | 115458 | |
| 6 | 109442 | |
| 5 | 203845 | |
| 4 | 162117 | |
| 3 | 133990 | |
| 2 | 92214 | |
| 1 | 74155 | 8.3% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.910139012 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 4 |
| Q3 | 5 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.580305974 |
|---|---|
| Coefficient of variation (CV) | 0.404155957 |
| Kurtosis | -0.3831343065 |
| Mean | 3.910139012 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.2879717072 |
| Sum | 3484798 |
| Variance | 2.497366972 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 4 | 334456 | |
| 2 | 140433 | |
| 3 | 131994 | 14.8% |
| 5 | 89056 | 10.0% |
| 7 | 87024 | 9.8% |
| 6 | 61484 | 6.9% |
| 1 | 46774 | 5.2% |
| Value | Count | Frequency (%) |
| 1 | 46774 | 5.2% |
| 2 | 140433 | |
| 3 | 131994 | 14.8% |
| 4 | 334456 | |
| 5 | 89056 | 10.0% |
| 6 | 61484 | 6.9% |
| 7 | 87024 | 9.8% |
| Value | Count | Frequency (%) |
| 7 | 87024 | 9.8% |
| 6 | 61484 | 6.9% |
| 5 | 89056 | 10.0% |
| 4 | 334456 | |
| 3 | 131994 | 14.8% |
| 2 | 140433 | |
| 1 | 46774 | 5.2% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.240609232 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 2.007372556 |
|---|---|
| Coefficient of variation (CV) | 0.4733689067 |
| Kurtosis | -1.134701252 |
| Mean | 4.240609232 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.00215072809 |
| Sum | 3779320 |
| Variance | 4.029544578 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 7 | 211377 | |
| 4 | 207128 | |
| 3 | 150801 | |
| 1 | 108130 | |
| 5 | 79566 | 8.9% |
| 2 | 73127 | 8.2% |
| 6 | 61092 | 6.9% |
| Value | Count | Frequency (%) |
| 1 | 108130 | |
| 2 | 73127 | 8.2% |
| 3 | 150801 | |
| 4 | 207128 | |
| 5 | 79566 | 8.9% |
| 6 | 61092 | 6.9% |
| 7 | 211377 |
| Value | Count | Frequency (%) |
| 7 | 211377 | |
| 6 | 61092 | 6.9% |
| 5 | 79566 | 8.9% |
| 4 | 207128 | |
| 3 | 150801 | |
| 2 | 73127 | 8.2% |
| 1 | 108130 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.945859669 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.946564233 |
|---|---|
| Coefficient of variation (CV) | 0.4933181603 |
| Kurtosis | -1.353534476 |
| Mean | 3.945859669 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1789455842 |
| Sum | 3516633 |
| Variance | 3.789112312 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 2 | 244714 | |
| 6 | 136205 | |
| 5 | 121786 | |
| 3 | 118889 | |
| 7 | 117378 | |
| 4 | 90161 | 10.1% |
| 1 | 62088 | 7.0% |
| Value | Count | Frequency (%) |
| 1 | 62088 | 7.0% |
| 2 | 244714 | |
| 3 | 118889 | |
| 4 | 90161 | 10.1% |
| 5 | 121786 | |
| 6 | 136205 | |
| 7 | 117378 |
| Value | Count | Frequency (%) |
| 7 | 117378 | |
| 6 | 136205 | |
| 5 | 121786 | |
| 4 | 90161 | 10.1% |
| 3 | 118889 | |
| 2 | 244714 | |
| 1 | 62088 | 7.0% |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.661784226 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.707636767 |
|---|---|
| Coefficient of variation (CV) | 0.4663400849 |
| Kurtosis | -0.655924441 |
| Mean | 3.661784226 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.3343106362 |
| Sum | 3263459 |
| Variance | 2.916023328 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 3 | 226571 | |
| 4 | 174203 | |
| 2 | 132657 | |
| 5 | 117378 | |
| 1 | 96775 | |
| 7 | 76133 | 8.5% |
| 6 | 67504 | 7.6% |
| Value | Count | Frequency (%) |
| 1 | 96775 | |
| 2 | 132657 | |
| 3 | 226571 | |
| 4 | 174203 | |
| 5 | 117378 | |
| 6 | 67504 | 7.6% |
| 7 | 76133 | 8.5% |
| Value | Count | Frequency (%) |
| 7 | 76133 | 8.5% |
| 6 | 67504 | 7.6% |
| 5 | 117378 | |
| 4 | 174203 | |
| 3 | 226571 | |
| 2 | 132657 | |
| 1 | 96775 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.023709046 |
| Minimum | 1 |
|---|---|
| Maximum | 7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 7 |
| Range | 6 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.077746254 |
|---|---|
| Coefficient of variation (CV) | 0.5163758687 |
| Kurtosis | -1.411240267 |
| Mean | 4.023709046 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | -0.03560142861 |
| Sum | 3586014 |
| Variance | 4.317029496 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 2 | 204333 | |
| 6 | 141714 | |
| 5 | 135205 | |
| 7 | 134756 | |
| 4 | 122982 | |
| 1 | 120437 | |
| 3 | 31794 | 3.6% |
| Value | Count | Frequency (%) |
| 1 | 120437 | |
| 2 | 204333 | |
| 3 | 31794 | 3.6% |
| 4 | 122982 | |
| 5 | 135205 | |
| 6 | 141714 | |
| 7 | 134756 |
| Value | Count | Frequency (%) |
| 7 | 134756 | |
| 6 | 141714 | |
| 5 | 135205 | |
| 4 | 122982 | |
| 3 | 31794 | 3.6% |
| 2 | 204333 | |
| 1 | 120437 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| SEMIO_DOM | SEMIO_ERL | SEMIO_FAM | SEMIO_KAEM | SEMIO_KRIT | SEMIO_KULT | SEMIO_LUST | SEMIO_MAT | SEMIO_PFLICHT | SEMIO_RAT | SEMIO_REL | SEMIO_SOZ | SEMIO_TRADV | SEMIO_VERT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6 | 3 | 6 | 6 | 7 | 3 | 5 | 5 | 5 | 4 | 7 | 2 | 3 | 1 |
| 1 | 7 | 2 | 4 | 4 | 4 | 3 | 2 | 3 | 7 | 6 | 4 | 5 | 6 | 1 |
| 2 | 7 | 6 | 1 | 7 | 7 | 3 | 4 | 3 | 3 | 4 | 3 | 4 | 3 | 4 |
| 3 | 4 | 7 | 1 | 5 | 4 | 4 | 4 | 1 | 4 | 3 | 2 | 5 | 4 | 4 |
| 4 | 2 | 4 | 4 | 2 | 3 | 6 | 4 | 2 | 4 | 2 | 4 | 6 | 2 | 7 |
| 5 | 4 | 2 | 4 | 4 | 4 | 5 | 2 | 4 | 7 | 7 | 7 | 2 | 6 | 2 |
| 6 | 4 | 5 | 5 | 7 | 7 | 5 | 6 | 7 | 7 | 7 | 5 | 2 | 7 | 2 |
| 7 | 1 | 2 | 7 | 2 | 1 | 7 | 2 | 5 | 5 | 5 | 7 | 7 | 5 | 6 |
| 8 | 5 | 4 | 5 | 3 | 5 | 5 | 6 | 1 | 1 | 2 | 4 | 4 | 4 | 5 |
| 9 | 6 | 6 | 1 | 7 | 7 | 3 | 6 | 3 | 1 | 4 | 1 | 2 | 3 | 2 |
Last rows
| SEMIO_DOM | SEMIO_ERL | SEMIO_FAM | SEMIO_KAEM | SEMIO_KRIT | SEMIO_KULT | SEMIO_LUST | SEMIO_MAT | SEMIO_PFLICHT | SEMIO_RAT | SEMIO_REL | SEMIO_SOZ | SEMIO_TRADV | SEMIO_VERT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 891211 | 5 | 4 | 2 | 2 | 5 | 5 | 6 | 6 | 3 | 5 | 4 | 4 | 4 | 5 |
| 891212 | 3 | 7 | 4 | 2 | 3 | 6 | 7 | 6 | 1 | 1 | 3 | 6 | 1 | 6 |
| 891213 | 5 | 7 | 2 | 6 | 4 | 2 | 5 | 5 | 2 | 3 | 4 | 2 | 3 | 3 |
| 891214 | 7 | 4 | 4 | 6 | 4 | 2 | 2 | 3 | 5 | 6 | 4 | 5 | 6 | 1 |
| 891215 | 7 | 5 | 5 | 7 | 6 | 5 | 6 | 7 | 6 | 7 | 5 | 2 | 7 | 2 |
| 891216 | 7 | 6 | 1 | 5 | 4 | 3 | 1 | 3 | 4 | 4 | 3 | 2 | 2 | 2 |
| 891217 | 4 | 7 | 4 | 4 | 4 | 4 | 7 | 5 | 6 | 4 | 7 | 4 | 2 | 4 |
| 891218 | 4 | 5 | 2 | 5 | 4 | 5 | 3 | 3 | 6 | 7 | 5 | 5 | 7 | 2 |
| 891219 | 2 | 2 | 7 | 2 | 2 | 7 | 3 | 5 | 7 | 5 | 7 | 7 | 5 | 6 |
| 891220 | 3 | 3 | 6 | 2 | 3 | 6 | 5 | 4 | 2 | 3 | 3 | 6 | 2 | 6 |
Most frequently occurring
| SEMIO_DOM | SEMIO_ERL | SEMIO_FAM | SEMIO_KAEM | SEMIO_KRIT | SEMIO_KULT | SEMIO_LUST | SEMIO_MAT | SEMIO_PFLICHT | SEMIO_RAT | SEMIO_REL | SEMIO_SOZ | SEMIO_TRADV | SEMIO_VERT | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 53413 | 6 | 3 | 6 | 6 | 7 | 3 | 5 | 5 | 5 | 4 | 7 | 2 | 3 | 1 | 73961 |
| 8153 | 2 | 4 | 6 | 3 | 5 | 5 | 1 | 6 | 4 | 5 | 4 | 3 | 4 | 7 | 4110 |
| 12782 | 3 | 3 | 6 | 2 | 3 | 4 | 5 | 6 | 4 | 1 | 2 | 3 | 1 | 7 | 3285 |
| 57986 | 6 | 6 | 1 | 7 | 7 | 1 | 1 | 2 | 4 | 4 | 1 | 2 | 3 | 4 | 2646 |
| 56608 | 6 | 5 | 5 | 7 | 7 | 3 | 2 | 7 | 5 | 6 | 5 | 2 | 7 | 4 | 2530 |
| 3172 | 2 | 1 | 7 | 2 | 1 | 7 | 2 | 7 | 5 | 4 | 6 | 7 | 5 | 7 | 2331 |
| 54310 | 6 | 4 | 4 | 6 | 7 | 3 | 2 | 4 | 5 | 6 | 5 | 2 | 6 | 1 | 2157 |
| 62546 | 6 | 7 | 1 | 6 | 7 | 1 | 5 | 4 | 4 | 4 | 4 | 2 | 1 | 4 | 1844 |
| 8233 | 2 | 4 | 6 | 3 | 5 | 5 | 4 | 6 | 4 | 5 | 4 | 3 | 4 | 7 | 1799 |
| 3229 | 2 | 1 | 7 | 2 | 2 | 7 | 2 | 5 | 7 | 4 | 7 | 7 | 5 | 6 | 1299 |